Overview

Dataset statistics

Number of variables13
Number of observations29101
Missing cells3043
Missing cells (%)0.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.9 MiB
Average record size in memory104.0 B

Variable types

NUM10
CAT2
BOOL1

Reproduction

Analysis started2021-12-05 15:56:23.501755
Analysis finished2021-12-05 15:56:32.413725
Duration8.91 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

pickup_dt has a high cardinality: 4343 distinct values High cardinality
borough has 3043 (10.5%) missing values Missing
pickup_dt is uniformly distributed Uniform
borough is uniformly distributed Uniform
pickups has 5567 (19.1%) zeros Zeros
spd has 3596 (12.4%) zeros Zeros
dewp has 303 (1.0%) zeros Zeros
pcp01 has 26468 (91.0%) zeros Zeros
pcp06 has 23460 (80.6%) zeros Zeros
pcp24 has 18631 (64.0%) zeros Zeros
sd has 20167 (69.3%) zeros Zeros

Variables

pickup_dt
Categorical

HIGH CARDINALITY
UNIFORM

Distinct count4343
Unique (%)14.9%
Missing0
Missing (%)0.0%
Memory size227.4 KiB
2015-01-01 01:00:00
 
7
2015-04-26 10:00:00
 
7
2015-04-26 14:00:00
 
7
2015-04-26 15:00:00
 
7
2015-04-26 16:00:00
 
7
Other values (4338)
29066
ValueCountFrequency (%) 
2015-01-01 01:00:007< 0.1%
 
2015-04-26 10:00:007< 0.1%
 
2015-04-26 14:00:007< 0.1%
 
2015-04-26 15:00:007< 0.1%
 
2015-04-26 16:00:007< 0.1%
 
2015-04-26 17:00:007< 0.1%
 
2015-04-26 18:00:007< 0.1%
 
2015-04-26 19:00:007< 0.1%
 
2015-04-26 20:00:007< 0.1%
 
2015-04-26 21:00:007< 0.1%
 
Other values (4333)2903199.8%
 
2021-12-05T10:56:32.465292image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Length

Max length19
Median length19
Mean length19
Min length19

borough
Categorical

MISSING
UNIFORM

Distinct count6
Unique (%)< 0.1%
Missing3043
Missing (%)10.5%
Memory size227.4 KiB
Bronx
4343
Brooklyn
4343
EWR
4343
Manhattan
4343
Queens
4343
ValueCountFrequency (%) 
Bronx434314.9%
 
Brooklyn434314.9%
 
EWR434314.9%
 
Manhattan434314.9%
 
Queens434314.9%
 
Staten Island434314.9%
 
(Missing)304310.5%
 
2021-12-05T10:56:32.529789image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Length

Max length13
Median length6
Mean length6.880210302
Min length3

pickups
Real number (ℝ≥0)

ZEROS

Distinct count3406
Unique (%)11.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean490.2159032335659
Minimum0
Maximum7883
Zeros5567
Zeros (%)19.1%
Memory size227.4 KiB
2021-12-05T10:56:32.587534image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median54
Q3449
95-th percentile2840
Maximum7883
Range7883
Interquartile range (IQR)448

Descriptive statistics

Standard deviation995.6495355
Coefficient of variation (CV)2.031042912
Kurtosis9.26766556
Mean490.2159032
Median Absolute Deviation (MAD)54
Skewness2.976238116
Sum14265773
Variance991317.9975
2021-12-05T10:56:32.644943image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0556719.1%
 
126569.1%
 
216985.8%
 
39373.2%
 
44741.6%
 
52570.9%
 
61280.4%
 
36850.3%
 
45840.3%
 
32810.3%
 
Other values (3396)1713458.9%
 
ValueCountFrequency (%) 
0556719.1%
 
126569.1%
 
216985.8%
 
39373.2%
 
44741.6%
 
ValueCountFrequency (%) 
78831< 0.1%
 
78011< 0.1%
 
77111< 0.1%
 
75121< 0.1%
 
72711< 0.1%
 

spd
Real number (ℝ≥0)

ZEROS

Distinct count114
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.98492418031781
Minimum0.0
Maximum21.0
Zeros3596
Zeros (%)12.4%
Memory size227.4 KiB
2021-12-05T10:56:32.703053image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median6
Q38
95-th percentile13
Maximum21
Range21
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.699007242
Coefficient of variation (CV)0.6180541525
Kurtosis0.4192409725
Mean5.98492418
Median Absolute Deviation (MAD)2
Skewness0.4190693213
Sum174167.2786
Variance13.68265458
2021-12-05T10:56:32.755641image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5381613.1%
 
0359612.4%
 
6354512.2%
 
3343211.8%
 
7302110.4%
 
825748.8%
 
915925.5%
 
1013694.7%
 
119363.2%
 
135571.9%
 
Other values (104)466316.0%
 
ValueCountFrequency (%) 
0359612.4%
 
0.6210.1%
 
0.75420.1%
 
1660.2%
 
1.2210.1%
 
ValueCountFrequency (%) 
217< 0.1%
 
20320.1%
 
18710.2%
 
17.57< 0.1%
 
171200.4%
 

vsb
Real number (ℝ≥0)

Distinct count179
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.818124896706218
Minimum0.0
Maximum10.0
Zeros6
Zeros (%)< 0.1%
Memory size227.4 KiB
2021-12-05T10:56:32.810387image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2.575
Q19.1
median10
Q310
95-th percentile10
Maximum10
Range10
Interquartile range (IQR)0.9

Descriptive statistics

Standard deviation2.442897359
Coefficient of variation (CV)0.2770313856
Kurtosis2.898539633
Mean8.818124897
Median Absolute Deviation (MAD)0
Skewness-2.042058313
Sum256616.2526
Variance5.967747505
2021-12-05T10:56:32.861350image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
102157874.1%
 
9.111373.9%
 
88452.9%
 
77802.7%
 
65601.9%
 
44031.4%
 
53951.4%
 
32670.9%
 
0.31270.4%
 
2.8333333331080.4%
 
Other values (169)290110.0%
 
ValueCountFrequency (%) 
06< 0.1%
 
0.31270.4%
 
0.33333333336< 0.1%
 
0.36666666677< 0.1%
 
0.4200.1%
 
ValueCountFrequency (%) 
102157874.1%
 
9.7757< 0.1%
 
9.7200.1%
 
9.55720.2%
 
9.33333333314< 0.1%
 

temp
Real number (ℝ≥0)

Distinct count295
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47.669042055501286
Minimum2.0
Maximum89.0
Zeros0
Zeros (%)0.0%
Memory size227.4 KiB
2021-12-05T10:56:33.040487image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile18
Q132
median46
Q364.5
95-th percentile79.5
Maximum89
Range87
Interquartile range (IQR)32.5

Descriptive statistics

Standard deviation19.81496901
Coefficient of variation (CV)0.4156779359
Kurtosis-1.037412126
Mean47.66904206
Median Absolute Deviation (MAD)16
Skewness0.05575251227
Sum1387216.793
Variance392.6329968
2021-12-05T10:56:33.092626image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
376752.3%
 
365952.0%
 
355481.9%
 
425311.8%
 
385291.8%
 
345141.8%
 
275081.7%
 
615021.7%
 
414941.7%
 
394941.7%
 
Other values (285)2371181.5%
 
ValueCountFrequency (%) 
2200.1%
 
314< 0.1%
 
4850.3%
 
5330.1%
 
6410.1%
 
ValueCountFrequency (%) 
89280.1%
 
88560.2%
 
87540.2%
 
86610.2%
 
851440.5%
 

dewp
Real number (ℝ)

ZEROS

Distinct count305
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.823064908586023
Minimum-16.0
Maximum73.0
Zeros303
Zeros (%)1.0%
Memory size227.4 KiB
2021-12-05T10:56:33.143021image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum-16
5-th percentile-2
Q114
median30
Q350
95-th percentile64.66666667
Maximum73
Range89
Interquartile range (IQR)36

Descriptive statistics

Standard deviation21.28344434
Coefficient of variation (CV)0.6905038288
Kurtosis-1.035223571
Mean30.82306491
Median Absolute Deviation (MAD)18
Skewness0.0154181971
Sum896982.0119
Variance452.9850028
2021-12-05T10:56:33.198287image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
395782.0%
 
225251.8%
 
185181.8%
 
565081.7%
 
614921.7%
 
254861.7%
 
404731.6%
 
204651.6%
 
104601.6%
 
604501.5%
 
Other values (295)2414683.0%
 
ValueCountFrequency (%) 
-16340.1%
 
-15270.1%
 
-13860.3%
 
-12980.3%
 
-111210.4%
 
ValueCountFrequency (%) 
737< 0.1%
 
7214< 0.1%
 
71.512< 0.1%
 
71.257< 0.1%
 
71700.2%
 

slp
Real number (ℝ≥0)

Distinct count413
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1017.81793752792
Minimum991.4
Maximum1043.4
Zeros0
Zeros (%)0.0%
Memory size227.4 KiB
2021-12-05T10:56:33.252566image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum991.4
5-th percentile1005.3
Q11012.5
median1018.2
Q31022.9
95-th percentile1030
Maximum1043.4
Range52
Interquartile range (IQR)10.4

Descriptive statistics

Standard deviation7.76879558
Coefficient of variation (CV)0.007632794917
Kurtosis0.06914463865
Mean1017.817938
Median Absolute Deviation (MAD)5.2
Skewness0.05284461782
Sum29619519.8
Variance60.35418476
2021-12-05T10:56:33.308417image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
10202690.9%
 
1020.52550.9%
 
1019.92370.8%
 
1020.92270.8%
 
1021.12260.8%
 
1022.72140.7%
 
1020.22130.7%
 
1020.32090.7%
 
1020.72040.7%
 
1021.22040.7%
 
Other values (403)2684392.2%
 
ValueCountFrequency (%) 
991.47< 0.1%
 
991.67< 0.1%
 
992.37< 0.1%
 
992.97< 0.1%
 
993.47< 0.1%
 
ValueCountFrequency (%) 
1043.47< 0.1%
 
1043.37< 0.1%
 
1043.27< 0.1%
 
1043.17< 0.1%
 
1042.96< 0.1%
 

pcp01
Real number (ℝ≥0)

ZEROS

Distinct count80
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.003830149021224929
Minimum0.0
Maximum0.28
Zeros26468
Zeros (%)91.0%
Memory size227.4 KiB
2021-12-05T10:56:33.366571image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.02
Maximum0.28
Range0.28
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.01893306515
Coefficient of variation (CV)4.943166713
Kurtosis87.81998828
Mean0.003830149021
Median Absolute Deviation (MAD)0
Skewness8.220954559
Sum111.4611667
Variance0.0003584609559
2021-12-05T10:56:33.418411image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02646891.0%
 
0.014391.5%
 
0.021810.6%
 
0.031470.5%
 
0.0051410.5%
 
0.051150.4%
 
0.003333333333950.3%
 
0.04790.3%
 
0.015780.3%
 
0.06750.3%
 
Other values (70)12834.4%
 
ValueCountFrequency (%) 
02646891.0%
 
0.0025400.1%
 
0.003333333333950.3%
 
0.0051410.5%
 
0.006666666667620.2%
 
ValueCountFrequency (%) 
0.28210.1%
 
0.26757< 0.1%
 
0.267< 0.1%
 
0.25333333337< 0.1%
 
0.257< 0.1%
 

pcp06
Real number (ℝ≥0)

ZEROS

Distinct count318
Unique (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02612874128036837
Minimum0.0
Maximum1.24
Zeros23460
Zeros (%)80.6%
Memory size227.4 KiB
2021-12-05T10:56:33.473244image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0.1875
Maximum1.24
Range1.24
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.09312533965
Coefficient of variation (CV)3.564095899
Kurtosis47.35606139
Mean0.02612874128
Median Absolute Deviation (MAD)0
Skewness5.936438429
Sum760.3725
Variance0.008672328884
2021-12-05T10:56:33.526510image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02346080.6%
 
0.018412.9%
 
0.022580.9%
 
0.031770.6%
 
0.051520.5%
 
0.0051210.4%
 
0.041100.4%
 
0.06950.3%
 
0.08920.3%
 
0.003333333333870.3%
 
Other values (308)370812.7%
 
ValueCountFrequency (%) 
02346080.6%
 
0.0025600.2%
 
0.003333333333870.3%
 
0.0051210.4%
 
0.006666666667330.1%
 
ValueCountFrequency (%) 
1.246< 0.1%
 
1.226< 0.1%
 
1.217< 0.1%
 
1.0837< 0.1%
 
1.0187< 0.1%
 

pcp24
Real number (ℝ≥0)

ZEROS

Distinct count484
Unique (%)1.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09046437121290214
Minimum0.0
Maximum2.1
Zeros18631
Zeros (%)64.0%
Memory size227.4 KiB
2021-12-05T10:56:33.582488image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.05
95-th percentile0.5755
Maximum2.1
Range2.1
Interquartile range (IQR)0.05

Descriptive statistics

Standard deviation0.2194022017
Coefficient of variation (CV)2.425288528
Kurtosis16.22082135
Mean0.09046437121
Median Absolute Deviation (MAD)0
Skewness3.605783873
Sum2632.603667
Variance0.04813732611
2021-12-05T10:56:33.632677image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01863164.0%
 
0.0112594.3%
 
0.053671.3%
 
0.023061.1%
 
0.092710.9%
 
0.083333333332560.9%
 
0.062250.8%
 
0.082170.7%
 
0.17933333331520.5%
 
0.031490.5%
 
Other values (474)726825.0%
 
ValueCountFrequency (%) 
01863164.0%
 
0.0025720.2%
 
0.003333333333910.3%
 
0.005870.3%
 
0.0058333333331070.4%
 
ValueCountFrequency (%) 
2.113< 0.1%
 
1.897< 0.1%
 
1.503833333640.2%
 
1.4938333337< 0.1%
 
1.497< 0.1%
 

sd
Real number (ℝ≥0)

ZEROS

Distinct count421
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.5291692438976896
Minimum0.0
Maximum19.0
Zeros20167
Zeros (%)69.3%
Memory size227.4 KiB
2021-12-05T10:56:33.684129image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32.958333333
95-th percentile12.16666667
Maximum19
Range19
Interquartile range (IQR)2.958333333

Descriptive statistics

Standard deviation4.520325424
Coefficient of variation (CV)1.787276765
Kurtosis1.313944097
Mean2.529169244
Median Absolute Deviation (MAD)0
Skewness1.589743978
Sum73601.35417
Variance20.43334194
2021-12-05T10:56:33.739211image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02016769.3%
 
819346.6%
 
113621.2%
 
123451.2%
 
93341.1%
 
71820.6%
 
11810.6%
 
131800.6%
 
21750.6%
 
0.75400.1%
 
Other values (411)520117.9%
 
ValueCountFrequency (%) 
02016769.3%
 
0.04166666667190.1%
 
0.0416666666713< 0.1%
 
0.0833333333313< 0.1%
 
0.08333333333190.1%
 
ValueCountFrequency (%) 
197< 0.1%
 
18.958333337< 0.1%
 
18.916666676< 0.1%
 
18.8756< 0.1%
 
18.833333337< 0.1%
 

hday
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size227.4 KiB
N
27980
Y
 
1121
ValueCountFrequency (%) 
N2798096.1%
 
Y11213.9%
 

Interactions

2021-12-05T10:56:24.757204image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:24.849723image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:24.925091image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.001409image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.076190image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.151170image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.227619image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.304036image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.381681image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.456219image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.605471image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.676890image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.741826image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.808167image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.874682image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:25.940590image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.008628image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.076172image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.145015image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.210748image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.281363image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.354999image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.422449image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.491222image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.558840image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.626559image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.695936image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.765531image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.836207image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.904174image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:26.976638image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.048331image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.113915image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.180757image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.245931image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.311100image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.378593image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.446127image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.597358image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.663384image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.733912image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.805877image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.871341image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:27.937931image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.003188image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.068537image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.135902image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.203587image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.273332image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.339072image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.409107image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.483501image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.551407image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.620490image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.688645image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.756691image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.826235image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.896264image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:28.967111image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.035584image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.108652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.182423image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.250082image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.319003image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.386814image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.454551image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.524095image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.593775image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.665278image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.733359image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.805924image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:29.882314image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.056998image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.128065image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.197743image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.267894image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.339662image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.411192image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.483900image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.553912image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.628615image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.700411image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.765828image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.832317image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.897579image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:30.962871image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.030693image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.099184image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.167751image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.233505image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.303842image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.382029image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.453726image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.526552image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.598029image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.669900image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.743839image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.820943image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.896029image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:31.967113image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-12-05T10:56:33.805178image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-12-05T10:56:33.895283image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-12-05T10:56:33.983919image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-12-05T10:56:34.075593image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-12-05T10:56:32.116520image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:32.268343image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-12-05T10:56:32.351644image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Sample

First rows

pickup_dtboroughpickupsspdvsbtempdewpslppcp01pcp06pcp24sdhday
02015-01-01 01:00:00Bronx1525.010.030.07.01023.50.00.00.00.0Y
12015-01-01 01:00:00Brooklyn15195.010.030.07.01023.50.00.00.00.0Y
22015-01-01 01:00:00EWR05.010.030.07.01023.50.00.00.00.0Y
32015-01-01 01:00:00Manhattan52585.010.030.07.01023.50.00.00.00.0Y
42015-01-01 01:00:00Queens4055.010.030.07.01023.50.00.00.00.0Y
52015-01-01 01:00:00Staten Island65.010.030.07.01023.50.00.00.00.0Y
62015-01-01 01:00:00NaN45.010.030.07.01023.50.00.00.00.0Y
72015-01-01 02:00:00Bronx1203.010.030.06.01023.00.00.00.00.0Y
82015-01-01 02:00:00Brooklyn12293.010.030.06.01023.00.00.00.00.0Y
92015-01-01 02:00:00EWR03.010.030.06.01023.00.00.00.00.0Y

Last rows

pickup_dtboroughpickupsspdvsbtempdewpslppcp01pcp06pcp24sdhday
290912015-06-30 22:00:00Manhattan44525.010.076.064.01011.90.00.00.00.0N
290922015-06-30 22:00:00Queens5565.010.076.064.01011.90.00.00.00.0N
290932015-06-30 22:00:00Staten Island25.010.076.064.01011.90.00.00.00.0N
290942015-06-30 23:00:00Bronx677.010.075.065.01011.80.00.00.00.0N
290952015-06-30 23:00:00Brooklyn9907.010.075.065.01011.80.00.00.00.0N
290962015-06-30 23:00:00EWR07.010.075.065.01011.80.00.00.00.0N
290972015-06-30 23:00:00Manhattan38287.010.075.065.01011.80.00.00.00.0N
290982015-06-30 23:00:00Queens5807.010.075.065.01011.80.00.00.00.0N
290992015-06-30 23:00:00Staten Island07.010.075.065.01011.80.00.00.00.0N
291002015-06-30 23:00:00NaN37.010.075.065.01011.80.00.00.00.0N